In silico SNP prediction of selected protein orthologues in insect models for Alzheimer's, Parkinson's, and Huntington’s diseases

Alzheimer's, Parkinson’s, and Huntington’s are the most common neurodegenerative diseases that are incurable and affect the elderly population. Discovery of effective treatments for these diseases is often difficult, expensive, and serendipitous. Previous comparative studies on different model organisms have revealed that most animals share similar cellular and molecular characteristics. The meta-SNP tool includes four different integrated tools (SIFT, PANTHER, SNAP, and PhD-SNP) was used to identify non synonymous single nucleotide polymorphism (nsSNPs). Prediction of nsSNPs was conducted on three representative proteins for Alzheimer's, Parkinson’s, and Huntington’s diseases; APPl in Drosophila melanogaster, LRRK1 in Aedes aegypti, and VCPl in Tribolium castaneum. With the possibility of using insect models to investigate neurodegenerative diseases. We conclude from the protein comparative analysis between different insect models and nsSNP analyses that D. melanogaster is the best model for Alzheimer’s representing five nsSNPs of the 21 suggested mutations in the APPl protein. Aedes aegypti is the best model for Parkinson’s representing three nsSNPs in the LRRK1 protein. Tribolium castaneum is the best model for Huntington’s disease representing 13 SNPs of 37 suggested mutations in the VCPl protein. This study aimed to improve human neural health by identifying the best insect to model Alzheimer's, Parkinson’s, and Huntington’s.


Materials and methods
This research paper was approved by the research ethics committee from the Faculty of Science, Ain Shams University (ASU-SCI/ENTO/2023/8/1).
Figure 1.The sequence of the performed analyses.

Bioinformatics analyses
1. Pairwise alignment was performed to detect protein homology and identify query coverage and percentage of protein identity.Alignment was performed between each protein in H. sapiens against its homolog in the selected model organisms using BLASTP (Basic Local Alignment Search Tool for protein) with default parameters from NCBI 32 , except (GRN and NEP2) alignments were performed against D. melanogaster because they showed no alignment against H. sapiens.2. In Silico SNP prediction of disease-causing variants was performed using the publicly available tool Meta-SNP (meta-predictor of disease-causing variants) 27,35,36 .This tool permits the detection of disease-associated nsSNVs for both well-identified and predicted amino acid sequences (SNPs based on dbSNP of humans) (accessed 22 June, 2023).This approach is characterised by other methods by integrating four existing methods: PANTHER, PhD-SNP, SIFT, and SNAP with defined default threshold parameters PANTHER, PhD-SNP, and Meta-SNP: Between 0 and 1 (If > 0.5, mutation is predicted Disease), SIFT: Positive Value (If > 0.05 mutation is predicted Neutral), SNAP: Output normalised between 0 and 1 (If > 0.5, mutation is predicted Disease).
(a).A local alignment search was performed between the substituted amino acid in the human protein and its homolog protein in the selected insect model using BLASTP and manual search, depending on finding the best match using 5 aa before and 5 aa after the substituted amino acid to provide a proper short sequence needed to find the accurate position of required aa.(b) The matched amino acids and protein sequence were entered into the meta-SNP analysis tool to determine the probability of causing disease for amino acid substitutions according to the human nsSNPs.

Pairwise alignment
Pairwise alignment using blastp with default parameters was conducted for each selected insect protein against its homolog in humans, except for GRN and NEP2, where pairwise alignment was conducted against the fruit fly (Tables S4, S5, and S6).A sharp cut-off value for homology, 75% query coverage, and 30% protein identity was applied to filter the results with the more meaningful values [37][38][39] .
The results showed that: For Alzheimer's disease

In Silico nsSNPs prediction
In Silico nsSNPs prediction is performed using five integrated tools (SIFT, PANTHER, SNAP, PhD-SNP, and Meta-SNP).
Polymorphism data for APP (NP_001191231.1),LRRK2 (NP_940980.4), and VCP (NP_009057.1)proteins were retrieved from the NCBI dbSNP database as a publicly available database.Accordingly, APP was found to contain four missense SNPs in its coding regions.The LRRK2 gene was found to have one missense SNP in its coding region, and the VCP gene was found to have five missense SNPs in its coding region, but two of them rs779834525, and rs1420316004, were related to the FANCG gene and not VCP.
Prediction using the SIFT, PANTHER, SNAP, PhD-SNP, and Meta-SNP tools results from 21 input-suggested mutations.Eleven mutations were predicted, Five of the 11 mutations showed deleterious or diseased points Table 4. Mutations V94F, V94L, A758G, A758V, and A820G are thought to be pathogenic in the AD D.  www.nature.com/scientificreports/ melanogaster model according to PANTHER, Phd-SNP, and Meta-SNP while SIFT and SNAP couldn't identify the effects of nsSNPs.In spite of the fact that A820G has the highest reliability index.For Parkinson's disease, SNP analysis was performed on A. aegypti LRRK1 protein (XP_021698550.1) as a homolog of H. sapiens LRRK2 with 27.47% protein identity, using the reference human SNP rs33939927 (R > S,G,C).
1.In rs33939927, Arg1441Ser or Arg1441Gly, or Arg1441Cys in humans matches Arg at position 1218 in A. aegypti.
Prediction using the SIFT, PANTHER, SNAP, PhD-SNP, and Meta-SNP tools results from three input suggested mutations.Three mutations showed deleterious or diseased points Table 5.Mutations R1218C, R1218C, and R1218S are thought to be pathogenic in the PD A. aegypti model according to Phd-SNP, and Meta-SNP while PANTHER, SIFT and SNAP couldn't identify the effects of nsSNPs.In spite of the fact that R1218C has the highest reliability index.
RI: A Reliability Index between 0 and 10 provides a means of focusing on the most accurate predictions.

Discussion
Neurodegenerative diseases are devastating diseases which are incurable and mostly result in the death of patients.To accelerate the search for treatments and save money, effort, and time, there is a need to determine the best model that mimics human disease.In turn, this leads to improved human neural health.Pairwise alignment was applied to each protein against humans for all proteins except (GRN and NEP2) against the fruit fly because they showed no alignment against H. sapiens.We determined the best insect for studying each protein separately by selecting the highest query coverage with the highest protein identity.
In this study, a total of eight insect models were used to find out which of them is the best to model each of AD, PD and HD.
For Alzheimer's, the best overall two models according to the average protein identity percentage for the 10 selected proteins were D. melanogaster then A. gambiae.Drosophila melanogaster is believed to have nearly 75% of human disease-causing genes functional homologs 15,40,41 .The fruitfly showed a high protein identity to human with reasonable query coverage in GRN, COL25A1, MAPT and RAC1.They can express different phenotypes of induced AD 15 .From the 10 proteins, APP was selected as a representative of AD related proteins in human.The analysis of nsSNPs related to APPl protein in the fruit fly showed predicted pathogenic nsSNPs (V94F, V94L, A758G, A758V, and A820G) that could be used for further studies on the induction of familial forms of early-onset Alzheimer's disease and cerebral amyloid angiopathy, and study the factors that increase total Aβ levels 42,43 .Anopheles gambiae is known to become an important model organism for the study of insect-parasite  45 .Thus in turn makes A. gambiae a potential model to study the pathology of these AD.For Parkinson's, the best two models according to the average protein identity percentage for the 13 selected proteins were A. aegypti then A. mellifera.A. aegypti has an advanced nervous system, with sensory organs used to locate their hosts in their environment 46 .On applying a sublethal dose of spinosyn insecticides to A. aegypti.Parkinson's disease-related genes were significantly enriched in spinetoram-exposed mosquitoes compared with controls 47 .Through our studies, it showed a high protein identity to human with reasonable query coverage for PARK6, VPS35, ATP13A2 and PLA2G6.From the 13 proteins, LRRK2 was selected as a representative of PD related proteins in human.The analysis of nsSNPs related to LRRK1 protein in the yellow fever mosquito showed predicted pathogenic nsSNPs (R1218C, R1218C, and R1218S) that could be used for induction of PD through mutations in the catalytic domains that may result in hyperactivation of the kinase domain, and show Lewy Body pathology 48 .Apis mellifera is more similar to vertebrates in terms of RNA (Ribonucleic acid) interference, DNA (Deoxyribonucleic Acid) methylation, and circadian rhythm 49 .It showed a high protein identity to human with reasonable query coverage in PARK2, VPS35 and ATP13A2.Honey bees' ethanol exposure causes changes in their body and wing kinematics 50 .Mechanisms identified in the cellular stress response to ethanol, such as the oxidative stress response, are also involved in Parkinson's disease 51 .Apis mellifera is a key social behavioural model that displays sophisticated cognitive abilities 52 .This makes it possible to analyse the changes occurring in honeybee brains during learning and remembering and increases the opportunity to be used also as a model for AD, along with the ability to identify new genome-based single-nucleotide polymorphisms (SNPs) 14,53 .
For Huntington's, T. castaneum then B. mori were the best models according to the average protein identity percentage.Tribolium castaneum has more olfactory receptors and detoxification genes than D. melanogaster and other insects and may be better adapted to its environment 45 .It shows a higher genetic homology to humans when compared to other invertebrate models, such as D. melanogaster 54 .Therefore, T. castaneum is one of the most suitable genetic models for post-genomic studies such as proteomics and functional genomics.It showed a high protein identity to human with reasonable query coverage in GRIK2, VPS13A and UBQLN2.From the 10 proteins, VCP was selected as a representative of HD related proteins in human.The analysis of SNPs related to VCPl protein in the Red flour beetle revealed predicted pathogenic nsSNPs (R268C, R268G, R268S, R282C, R282G, R282S, R836C, R710C, R710G, R710S, R750C, R750G, and R750S) that could be used for further studies on the gene role in cell division, the cell apoptosis, repairing damaged DNA, and formation of abnormal proteins build up in muscle, bone and brain cells that lead to induction of HD.These protein aggregations interfere with the normal functions of the brain cells 55,56 .The PINK1 protein from the T. castaneum beetle (TcPINK1) exhibits catalytic activity toward ubiquitin, parkin, and generic substrates and provides a basis for further studies on human Parkinson's disease 57 .Bombyx mori shares 58% of diseased human homologs genes, which are related to neurodegenerative diseases such as HD, oxidative stress, and protein degradation-associated genes 58 .Bombyx mori has higher identical VPS35, and UCHL1 to H. sapiens than D. melanogaster.Downregulation of the DJ-1 gene causes p-translucent silkworm as a result of increased oxidative stress response of the body, which leads to oxidative damage to the nerves and tissues 17,18 .
Galleria mellonella didn't represent the best model for any of the three studied NDs, although it has a similar innate immune response to that of mammals, regardless of whether it evolved separately from mammals several thousand years ago [29][30][31] .Comparative studies of genomes have shown that it has numerous homologues of human genes encoding proteins involved in pathogen recognition or signal transduction 59,60 .According to our study, it showed a high protein identity to human with reasonable query coverage in MAPT, ATP13A2, GIGYF2 and RAC1.In addition, its larvae can cultivate Bacteria such as Borrelia burgdorferi 61 , Enterococcus faecalis 62 , and Staphylococcus aureus 63 , which are believed to play a role in neuroinflammation and may contribute to AD.
Musca domestica has a strong immune system and has been used as a model to investigate the presence of enhanced detoxification 64 .Applying its larval extract on an AD diseased mouse has therapeutic effects against memory impairment, structural damage, and oxidative stress 65 .According to our study, it showed a high protein identity to human with reasonable query coverage in RAC1, COL25A1, HDAC6, DJ-1, GRIK2, VPS13A, VCP and UBQLN2.
These findings will assist in the selection of the best model for further studies in simulation diseases, deep understanding for mutations and their effects and how to fix them genetically or through improving drug discovery.The average percentage of protein identity between the different insect models and the selected proteins is provided in the supplementary data, as shown in Figs. 5, and 6.

Conclusion
The increasing prevalence of neurodegenerative diseases such as Alzheimer's, Parkinson's and Huntington's necessitates improvement in our understanding of these diseases.The research strategy for NDs is two-armed; one of them focuses on finding actual treatments that work on delaying symptoms or preventing disease development, whereas the other depends on searching for tools that can be used to detect the earliest and indirect signs of the disease and this is our point.Thus, it is crucial to simulate the disease, identify the counterparts of human diseased genes, test and apply their findings to easily handled model organisms.Comparative analysis has the potential to improve research and drug development for human diseases.
In this study, a total of 61 SNPs were checked in APPl, LRRK1 and VCPl proteins of D. melanogaster, A. aegypti and T. castaneum respectively by five prediction tools; 21 out of 29 SNPs showed a deleterious effect and 8 of the 21 showed high reliability index.For the 21 deleterious nsSNPs, most of them are located on the functional domains of the proteins.
Although mammalian models are more similar to humans, insects are often preferred because of their shorter lifespan and fewer ethical constraints.Human insect disease models provide new tools for drug discovery to overcome current limitations by using them at different stages as models that show a significant response to many drugs that act on the mammalian central nervous system (CNS) instead of differences in their brains, which allows researchers to find new therapeutic strategies.
In conclusion A. mellifera, T. castaneum, B. mori, A. aegypti besides D. melanogaster have promising future in the field of medical research and provide valuable insights into common neurodegenerative diseases as AD and PD and rare diseases as HD.This study provides comprehensive information on the available insect models on the protein-level resources and analysis of the predicted functional nsSNPs to improve human neural health by finding the best insect model to study Alzheimer's disease, Parkinson's disease, and Huntington's disease, and to find answers to complex biological questions as the functional impacts of these variants.This will happen by using the findings of the predicted nsSNPs for example to enhance wet-labs experiments and detect the proper position to be knocked down and find out the pathological effects of it and on determining the possible affected genes or proteins on induction of one of the NDs in its proper models.

Recommendation
To maximise the benefits, we recommend the provision of stock centres of different insect models, mutant and transgenic strains, microarrays, or RNA interference libraries, and working on updating annotations, providing more genome sequencing and assembly of sequenced insects.Additionally, we recommend the development of tools specific to insect model organisms.

Figure 5 .
Figure 5.The heatmap shows the percentage of protein identity for AD, PD, and HD between different insect models, Where deep colour refers to high protein identity and light colour refers to low protein identity.The heatmap was generated using RStudio version 2022.12.0 + 353.

Figure 6 .
Figure 6.The diagram shows the average protein identity percentage between different selected insect models.The best overall insect models according to protein identity are The Fruit fly for AD, Yellow fever mosquito for PD, and Red flour beetle for HD.

Table 1 :
A. mellifera shows greater identity to H. sapiens than D. melanogaster for APP protein.Musca domestica has more identity with H. sapiens than D. melanogaster for COL25A1 protein.Aedes aegypti is the nearest in identity to D. melanogaster for the GRN protein.Musca domestica, and A. aegypti show greater identity to H. sapiens than D. melanogaster for HDAC6 protein.Galleria mellonella has the closest Tau/ Mapt protein identity to H. sapiens beside D. melanogaster.Aedes aegypti shows greater identity to D. melanogaster for the Nep2 protein.Tribolium castaneum, and B. mori have more identical Psn1 and Psn2 to H. sapiens than D.
melanogaster.For RAC1 T. castaneum, M. domestica, and G. mellonella were more identical to H. sapiens than D. melanogaster.In the absence of SORL1 in D. Melanogaster; A. mellifera, T. castaneum, and A. aegypti showed a higher identity with H. sapiens.As shown in (Fig.2).For Parkinson's diseaseTable 2: M. domestica and A. gambiae show better protein identity to H. sapiens than D. melanogaster for DJ-1 protein.Tribolium castaneum has greater GAK, HTRA2, LRRK2, and EIF4G1 protein identity to H. sapiens than D. melanogaster.In the case of VPS35, A. mellifera showed the highest protein identity with H. sapiens.B. mori showed higher UCHL1 protein identity than D. melanogaster.For A. aegypti and G.

Table 1 .
The protein identity percentages between human, Mus musculus and other selected insect models in AD, with cut off 75% of query coverage and 30% of protein identity besides the fruit fly as a reference insect model.Highest protein identities are highlighted in italic and highest query coverages are highlighted in bold. 1 Mus musculus, 2 Drosophila melanogaster, 3 Apis mellifera, 4 Tribolium castaneum, 5 Bombyx mori, 6 Musca domestica, 7 Galleria mellonella, 8 Anopheles gambiae, 9 Aedes aegypti.

Length Identity (%) ID Query cover (%) Length Identity (%) ID Query cover (%) Length Identity (%) ID
mellonella, ATP13A2 protein showed more identity to H. sapiens than D. melanogaster.In addition, G. mellonella has a better GIGYF2 protein identity with H. sapiens than D. melanogaster.A. gambiae showed greater identity to H. sapiens than D. melanogaster for the PLA2G6 protein.As shown in Fig.3.For Huntington's disease

Table 3 :
A. mellifera has higher HTT, UBQLN2, and DMBK protein identity to H. sapiens than D. melanogaster.B. mori has better GRIK2 and ATXN1 protein identity to H. sapiens than D. melanogaster.Apis mellifera, A. gambiae, and A. aegypti were closer to H. sapiens than D.

Table 3 .
The protein identity percentages between human , Mus musculus and other selected insect models in HD, with cut off 75% of query coverage and 30% of protein identity besides the fruit fly as a reference insect model.Highest protein identities are highlighted in italic and highest query coverages are highlighted in bold. 1 Mus musculus, 2 Drosophila melanogaster, 3 Apis mellifera, 4 Tribolium castaneum, 5 Bombyx mori, 6 Musca domestica, 7 Galleria mellonella, 8 Anopheles gambiae, 9 Aedes aegypti.

Table 5 .
Predicted nsSNPs R1218C, R1218C, and R1218S in A. aegypti LRRK1 protein.Italic for diseased effect and Bold for neutral effect.